Computing monotone policies for Markov decision processes: a nearly-isotonic penalty approach
نویسندگان
چکیده
This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by exploiting the monotone property. The first stage is a linear program formulated in terms of the joint state-action probabilities. The second stage is a regularized problem formulated in terms of the conditional probabilities of actions given states. The regularization uses techniques from nearly-isotonic regression. While a variety of iterative method can be used in the first formulation of the problem, we show in numerical simulations that, in particular, the alternating method of multipliers (ADMM) can be significantly accelerated using the regularization step.
منابع مشابه
Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation
متن کامل
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES by
This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the reward-penalty type for the communicating case of multi-chain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expect...
متن کاملOn the Class of Closed Dynamic Programs
This paper considers a class of general dynamic programs which satisfies the mono tonicity and contraction assumption, and in which the sets of cost functions and policies are closed under the monotone contraction operators. This class of dynamic programs includes, piecewise linear, affine dynamic programs, partially observable Markov decision processes, and many sequential decision processes u...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملIdentification of optimal policies in Markov decision processes
In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.00621 شماره
صفحات -
تاریخ انتشار 2017